50 research outputs found

    Creating multimedia dictionaries of endangered languages using LEXUS

    No full text
    This paper reports on the development of a flexible web based lexicon tool, LEXUS. LEXUS is targeted at linguists involved in language documentation (of endangered languages). It allows the creation of lexica within the structure of the proposed ISO LMF standard and uses the proposed concept naming conventions from the ISO data categories, thus enabling interoperability, search and merging. LEXUS also offers the possibility to visualize language, since it provides functionalities to include audio, video and still images to the lexicon. With LEXUS it is possible to create semantic network knowledge bases, using typed relations. The LEXUS tool is free for use. Index Terms: lexicon, web based application, endangered languages, language documentation

    LEXUS & ViCoS From lexical to conceptual spaces

    Get PDF
    LEXUS and ViCoS: from lexicon to conceptual spaces LEXUS is a web-based lexicon tool and the knowledge space software ViCoS is an extension of LEXUS, allowing users to create relations between objects in and across lexica. LEXUS and ViCoS are part of the Language Archiving Technology software, developed at the MPI for Psycholinguistics to archive and enrich linguistic resources collected in the framework of language documentation projects. LEXUS is of primary interest for language documentation, offering the possibility to not just create a digital dictionary, but additionally it allows the creation of multi-media encyclopedic lexica. ViCoS provides an interface between the lexical space and the ontological space. Its approach permits users to model a world of concepts and their interrelations based on categorization patterns made by the speech community. We describe the LEXUS and ViCoS functionalities using three cases from DoBeS language documentation projects: (1) Marquesan The Marquesan lexicon was initially created in Toolbox and imported into LEXUS using the Toolbox import functionality. The lexicon is enriched with multi-media to illustrate the meaning of the words in its cultural environment. Members of the speech community consider words as keys to access and describe relevant parts of their life and traditions. Their understanding of words is best described by the various associations they evoke rather than in terms of any formal theory of meaning. Using ViCoS a knowledge space of related concepts is being created. (2) Kola-Såmi Two lexica are being created in LEXUS: RuSaDic lexicon is a Russian-Kildin wordlist in which the entries are of relative limited structure and content. SaRuDiC is a more complex structured lexicon with much richer content, including multi-media fragments and derivations. Using ViCoS we have created a connection between the two lexica, so that speakers who are familiair with Russian and wish to revitalize their Kildin can enter the lexicon through the RuSaDic and from there approach the informative SaRuDic. Similary we will create relations from the two lexica to external open databases, like e.g. Álgu. (3) Beaver A speaker database including kinship relations has been created and the database has been imported into LEXUS. In the LEXUS views the relations for individual speakers are being displayed. Using ViCoS the relational information from the database will be extracted to form a kisnhip relation space with specific relation types, like e.g 'mother-of'. The whole set of relations from the database can be displayed in one ViCoS relation window, and zoom functionality is available

    Ensuring semantic interoperability on lexical resources

    Get PDF
    In this paper, we describe a unifying approach to tackle data heterogeneity issues for lexica and related resources. We present LEXUS, our software that implements the Lexical Markup Framework (LMF) to uniformly describe and manage lexica of different structures. LEXUS also makes use of a central Data Category Registry (DCR) to address terminological issues with regard to linguistic concepts as well as the handling of working and object languages. Finally, we report on ViCoS, a LEXUS extension, providing support for the definition of arbitrary semantic relations between lexical entries or parts thereof

    ISOcat: Remodeling metadata for language resources

    No full text
    The Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, is creating a state-of-the-art web environment for the ISO TC 37 (terminology and other language and content resources) metadata registry. This Data Category Registry (DCR) is called ISOcat and encompasses data categories for a broad range of language resources. Under the governance of the DCR Board, ISOcat provides an open work space for creating data category specifications, defining Data Category Selections (DCSs) (domain-specific groups of data categories), and standardising selected data categories and DCSs. Designers visualise future interactivity among the DCR, reference registries and ontological knowledge space

    An API for accessing the data category registry

    Get PDF
    International audienceCentral Ontologies are increasingly important to manage interoperability between different types of language resources. This was the reason for ISO to set up a new committee ISO TC37/SC4 taking care of language resource management issues. Central to the work of this committee is the definition of a framework for a central registry of data categories that are important in the domain of language resources. This paper describes an application programming interface that was designed to request services from this data category registry. The DCR is operational and the described API has already been tested from a lexicon application

    A Federation of Language Archives Enabling Future eHumanities Scenarios

    No full text
    This paper describes the need for new infrastructures for future eScience scenarios in the humanities. Three projects working on different aspects of these infrastructures are examined in detail. The first project is trying to achieve a federation of archives, developing an integration layer at the level of localization, access to and referring to an archive’s raw data objects. The other two try to achieve interoperability at the level of semantic interpretation of linguistic data-types and tagging systems. The project’s different approaches to this problem show the trade-of between flexibility and the user’s workload. All three approaches give an impression about the necessary steps to come to an eHumanities scenario

    Lexicon standards: From de facto standard Toolbox MDF to ISO standard LMF

    No full text
    This paper discusses possible solutions for the apparent incompatibility between two standards for lexicon structure and concept naming: the de facto standard MDF, which is part of the widely used lexicon application Toolbox and the newly accepted ISO standard LMF, ISO FDIS 24613:2008, implemented in the online lexicon tool LEXUS. The basic difference between the two standards is that in MDF, the form-related and meaning-related parts of lexical entries are embedded in each other, while in LMF there is a strict separation of the two parts. The difference might be related to the final medium for which the standards have been created; although Toolbox is a tool for digital lexicon creation, the MDF format was created for printed dictionaries, whereas LMF is created for digital presentation of lexicon resources. At first sight the difference seems to be fundamental and impossible to overcome. However, in this paper we would like show possible solutions, and would like to probe them in the LREC2010 workshop on Language Resource and Language Technology Standards, and thoroughly discuss them amongst a wide linguistic public, before implementing a conversion procedure in the Toolbox import module of the LEXUS tool

    Online lexicons, LEXUS and ViCoS

    No full text
    An introduction to online lexicons and tools for the creation of these: LEXUS and ViCo

    Exploring and enriching a language resource archive via the web

    Get PDF
    The ”download first, then process paradigm” is still the predominant working method amongst the research community. The web-based paradigm, however, offers many advantages from a tool development and data management perspective as they allow a quick adaptation to changing research environments. Moreover, new ways of combining tools and data are increasingly becoming available and will eventually enable a true web-based workflow approach, thus challenging the ”download first, then process” paradigm. The necessary infrastructure for managing, exploring and enriching language resources via the Web will need to be delivered by projects like CLARIN and DARIA
    corecore